Global Statistics in Proximity Weighting Models
نویسندگان
چکیده
Information retrieval systems often use proximity or term dependence models to increase the effectiveness of document retrieval. Many of the existing proximity models examine document-level local statistics, such as the frequencies that pairs of query terms occur within fixed-size windows of each document, before applying standard or adapted weighting functions – for instance Markov Random Fields. Term weighting models use Inverse Document Frequency (IDF) to control the influence of occurrences of different query terms in documents. Similarly, some proximity models also take into account the frequency of pairs of query terms in the entire corpus of documents. However, pair frequency is an expensive statistic to pre-compute at indexing time, or to compute at retrieval time before scoring documents. In this work, we examine in a uniform setting, the importance of such global statistics for proximity weighting. We investigate two sources of global statistics, namely the target corpus, and the entire Web. Experiments are conducted using the TREC GOV2 and ClueWeb09 test collections. Our results show that local statistics alone are sufficient for effective retrieval, and global statistics usually do not bring any significant improvement in effectiveness, compared to the same proximity approaches that do not use these global statistics.
منابع مشابه
Efficient Dynamic Pruning with Proximity Support
Modern retrieval approaches apply not just single-term weighting models when ranking documents instead, proximity weighting models are in common use, which highly score the co-occurrence of pairs of query terms in close proximity to each other in documents. The adoption of these proximity weighting models can cause a computational overhead when documents are scored, negatively impacting the eff...
متن کاملKernel weighted influence measures
To asses the sensitivity of conclusions to model choices in the context of selection models for non-random dropout, several methods have been developed. None of them are without limitations. A new method called kernel weighted influence is proposed. While global and local influence approaches look upon the influence of cases, this new method looks at the influence of types of observations. The ...
متن کاملThe relative use of proximity, shape similarity, and orientation as visual perceptual grouping cues in tufted capuchin monkeys (Cebus apella) and humans (Homo sapiens).
Recent experimental results suggest that human and nonhuman primates differ in how they process visual information to assemble component parts into global shapes. To assess whether some of the observed differences in perceptual grouping could be accounted for by the prevalence of different grouping factors in different species, we carried out 2 experiments designed to evaluate the relative use ...
متن کاملEcological Statistics of Contour Grouping
The Gestalt laws of perceptual organization were originally conceived as qualitative principles, intrinsic to the brain. In this paper, we develop quantitative models for these laws based upon the statistics of natural images. In particular, we study the laws of proximity, good continuation and similarity as they relate to the perceptual organization of contours. We measure the statistical powe...
متن کاملAlternative GMM estimators for first-order autoregressive panel model: An improving efficiency approach
This paper considers first-order autoregressive panel model which is a simple model for dynamic panel data (DPD) models. The generalized method of moments (GMM) gives efficient estimators for these models. This efficiency is affected by the choice of the weighting matrix which has been used in GMM estimation. The non-optimal weighting matrices have been used in the conventional GMM estimators. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010